Goto

Collaborating Authors

 Social Events


Watch: Chris Martin surprises couple with performance at their wedding

BBC News

Coldplay's Chris Martin made a surprise appearance at a couple's wedding to play the music for their first dance. The groom's mother had asked the singer for a video message to be played at the wedding of Abbie and James Hotchkiss from Stafford. He went one better, though, and said he would appear in person, with only the newlyweds and the groom's parents in on the secret. Surprised guests saw him walk into the wedding venue, Blithfield Lakeside Barns in Staffordshire, wearing a white beanie hat to perform All My Love at the piano while the couple danced. Guests took a while to notice it was actually him, but didn't want to ruin our wedding day so asked us loads of questions once he'd gone, Mrs Hotchkiss said.


PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation

Pan, Ting, Wang, Ye, Jing, Peiguang, Ma, Rui, Yi, Zili, Liu, Yu

arXiv.org Artificial Intelligence

Personalized dual-person portrait customization has considerable potential applications, such as preserving emotional memories and facilitating wedding photography planning. However, the absence of a benchmark dataset hinders the pursuit of high-quality customization in dual-person portrait generation. In this paper, we propose the PairHuman dataset, which is the first large-scale benchmark dataset specifically designed for generating dual-person portraits that meet high photographic standards. The PairHuman dataset contains more than 100K images that capture a variety of scenes, attire, and dual-person interactions, along with rich metadata, including detailed image descriptions, person localization, human keypoints, and attribute tags. We also introduce DHumanDiff, which is a baseline specifically crafted for dual-person portrait generation that features enhanced facial consistency and simultaneously balances in personalized person generation and semantic-driven scene creation. Finally, the experimental results demonstrate that our dataset and method produce highly customized portraits with superior visual quality that are tailored to human preferences. Our dataset is publicly available at https://github.com/annaoooo/PairHuman.


Explicit and Implicit Data Augmentation for Social Event Detection

Ma, Congbo, Wang, Yuxia, Wu, Jia, Yang, Jian, Du, Jing, Qiu, Zitai, Li, Qing, Wang, Hu, Nakov, Preslav

arXiv.org Artificial Intelligence

Social event detection involves identifying and categorizing important events from social media, which relies on labeled data, but annotation is costly and labor-intensive. To address this problem, we propose Augmentation framework for Social Event Detection (SED-Aug), a plug-and-play dual augmentation framework, which combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness. The explicit augmentation utilizes large language models to enhance textual information through five diverse generation strategies. For implicit augmentation, we design five novel perturbation techniques that operate in the feature space on structural fused embeddings. These perturbations are crafted to keep the semantic and relational properties of the embeddings and make them more diverse. Specifically, SED-Aug outperforms the best baseline model by approximately 17.67% on the Twitter2012 dataset and by about 15.57% on the Twitter2018 dataset in terms of the average F1 score. The code is available at GitHub: https://github.com/congboma/SED-Aug.


SIGMUS: Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces

Wang, Brian, Srivastava, Mani

arXiv.org Artificial Intelligence

Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.


Help! My Husband's Best Man Made a Stunning Admission During His Wedding Speech. I Might Never Get Over It.

Slate

Dear Prudence is Slate's advice column. For this edition, Hillary Frey, Slate's editor-in-chief, will be filling in as Prudie. My partner of five years and I just got married after two years of extensive wedding planning and preparation. We had a very large guest list with a variety of needs that needed to be taken into account, such as international travel and physical limitations, and I feel grateful that my husband was very intentional about making sure the labor of wedding planning was split as equitably as possible between the two of us. We agreed that we wanted to write our own vows because we thought it was more meaningful than using traditional ones.


NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization

Kim, Hyuntak, Kim, Byung-Hak

arXiv.org Artificial Intelligence

Summarizing long-form narratives--such as books, movies, and TV scripts--requires capturing intricate plotlines, character interactions, and thematic coherence, a task that remains challenging for existing LLMs. We introduce NexusSum, a multi-agent LLM framework for narrative summarization that processes long-form text through a structured, sequential pipeline--without requiring fine-tuning. Our approach introduces two key innovations: (1) Dialogue-to-Description Transformation: A narrative-specific preprocessing method that standardizes character dialogue and descriptive text into a unified format, improving coherence. (2) Hierarchical Multi-LLM Summarization: A structured summarization pipeline that optimizes chunk processing and controls output length for accurate, high-quality summaries. Our method establishes a new state-of-the-art in narrative summarization, achieving up to a 30.0% improvement in BERTScore (F1) across books, movies, and TV scripts. These results demonstrate the effectiveness of multi-agent LLMs in handling long-form content, offering a scalable approach for structured summarization in diverse storytelling domains.


$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions

Ghosh, Sayontan, Koupaee, Mahnaz, Lal, Yash Kumar, Alipoormolabashi, Pegah, Hasan, Mohammad Saqib, Kang, Jun Seok, Balasubramanian, Niranjan

arXiv.org Artificial Intelligence

Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs with these characteristics. Using this methodology, we create $\texttt{DIAMONDs}$, a new conversational QA dataset covering common business, financial or other group interactions. In these goal-oriented conversations, participants often have to track certain numerical quantities (say $\textit{expected profit}$) of interest that can be derived from other variable quantities (like $\textit{marketing expenses, expected sales, salary}$, etc.), whose values also change over the course of the conversation. $\texttt{DIAMONDs}$ questions pose simple numerical reasoning problems over such quantities of interest (e.g., $\textit{funds required for charity events, expected company profit next quarter}$, etc.) in the context of the information exchanged in conversations. This allows for precisely evaluating ToM capabilities for carefully tracking and reasoning over participants' knowledge states. Our evaluation of state-of-the-art language models reveals significant challenges in handling participant-centric reasoning, specifically in situations where participants have false beliefs. Models also struggle with conversations containing distractors and show limited ability to identify scenarios with insufficient information. These findings highlight current models' ToM limitations in handling real-world multi-party conversations.


R^3-VQA: "Read the Room" by Video Social Reasoning

Niu, Lixing, Li, Jiapeng, Yu, Xingping, Wang, Shu, Feng, Ruining, Wu, Bo, Wei, Ping, Wang, Yisen, Fan, Lifeng

arXiv.org Artificial Intelligence

"Read the room" is a significant social reasoning capability in human daily life. Humans can infer others' mental states from subtle social cues. Previous social reasoning tasks and datasets lack complexity (e.g., simple scenes, basic interactions, incomplete mental state variables, single-step reasoning, etc.) and fall far short of the challenges present in real-life social interactions. In this paper, we contribute a valuable, high-quality, and comprehensive video dataset named R^3-VQA with precise and fine-grained annotations of social events and mental states (i.e., belief, intent, desire, and emotion) as well as corresponding social causal chains in complex social scenarios. Moreover, we include human-annotated and model-generated QAs. Our task R^3-VQA includes three aspects: Social Event Understanding, Mental State Estimation, and Social Causal Reasoning. As a benchmark, we comprehensively evaluate the social reasoning capabilities and consistencies of current state-of-the-art large vision-language models (LVLMs). Comprehensive experiments show that (i) LVLMs are still far from human-level consistent social reasoning in complex social scenarios; (ii) Theory of Mind (ToM) prompting can help LVLMs perform better on social reasoning tasks. We provide some of our dataset and codes in supplementary material and will release our full dataset and codes upon acceptance.


They Fell in Love Playing 'Minecraft.' Then the Game Became Their Wedding Venue

WIRED

On a crisp Saturday in March, beneath a canopy of pixelated cherry blossoms, two avatars stood in front of a digital altar crafted from shimmering quartz blocks and flickering redstone torches. They were surrounded by a sprawling Minecraft village, complete with custom-coded NPCs reciting lore about the couple's decade-long digital courtship. Nearby, pixelated foxes darted between guests--each one logged in from across the world, dressed in custom skins as forest druids and rogue mages. After the vows (typed and read aloud on Discord), guests dispersed for side quests, scavenger hunts, and an enchanted maze culminating in a virtual fireworks show. This wasn't a rehearsal for an in-person wedding--this was the wedding.


Revisiting the Predictability of Performative, Social Events

Perdomo, Juan C.

arXiv.org Machine Learning

Social predictions do not passively describe the future; they actively shape it. They inform actions and change individual expectations in ways that influence the likelihood of the predicted outcome. Given these dynamics, to what extent can social events be predicted? This question was discussed throughout the 20th century by authors like Merton, Morgenstern, Simon, and others who considered it a central issue in social science methodology. In this work, we provide a modern answer to this old problem. Using recent ideas from performative prediction and outcome indistinguishability, we establish that one can always efficiently predict social events accurately, regardless of how predictions influence data. While achievable, we also show that these predictions are often undesirable, highlighting the limitations of previous desiderata. We end with a discussion of various avenues forward.